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ABSTRACT 

This  paper  describes  two  practical  fusion  techniques  (hybrid  fusion  and  cued  fusion)  for  automatic  target  cueing  that 
combine  features  derived  from  each  sensor  data  at  the  object-level.  In  the  hybrid  fusion  method  each  of  the  input 
sensor  data  is  prescreened  (i.e.  Automatic  Target  Cueing  (ATC)  is  performed)  before  the  fusion  stage.  The  cued 
fusion  method  assumes  that  one  of  the  sensors  is  designated  as  a  primary  sensor,  and  thus  ATC  is  only  applied  to  its 
input  data.  If  one  of  the  sensors  exhibits  a  higher  Pd  and/or  a  lower  false  alarm  rate,  it  can  be  selected  as  the  primary 
sensor.  However,  if  the  ground  coverage  can  be  segmented  to  regions  in  which  one  of  the  sensors  is  known  to 
exhibit  better  performance,  then  the  cued  fusion  can  be  applied  locally/adaptively  by  switching  the  choice  of  a 
primary  sensor.  Otherwise,  the  cued  fusion  is  applied  both  ways  (each  sensor  as  primary)  and  the  outputs  of  each 
cued  mode  are  combined.  Both  fusion  approaches  use  a  back-end  discrimination  stage  that  is  applied  to  a  combined 
feature  vector  to  reduce  false  alarms.  The  two  fusion  processes  were  applied  to  spectral  and  radar  sensor  data  and 
were  shown  to  provide  substantial  false  alarm  reduction.  The  approaches  are  easily  extendable  to  more  than  two 
sensors. 


1.0  INTRODUCTION 

Recent  crisis  and  conflict  operations  have  reinforced  the  need  for  broad  area  imagery  coverage  to  support  all  stages 
of  operations.  The  first  utility  of  imagery  usually  takes  the  form  of  target  detection/recognition  and  change 
detection.  The  ability  to  reliably  and  rapidly  detect,  discriminate  and  classify  military  targets  can  provide  a 
significant  tactical  advantage  in  the  battlefield.  Automatic  target  detection  and  recognition  (ATD/ATR)  has  been  a 
focus  of  research  for  the  last  two  decades.  The  performance  of  automatic  target  recognition  has  not  yet  reached  the 
required  level  of  recognition  accuracy  and  speed.  The  complexity  of  the  recognition  process  has  forced  the 
development  of  automatic  target  prescreening  technologies  in  order  to  cue  the  complex/time-consuming  recognition 
stage  to  limited/reduced  data.  Currently  ATR  processes  are  primarily  used  as  a  second  layer  for  reduction  of  false 
cues  (i.e.,  target  /  no  target  decision)  and  leaving  the  actual  recognition  to  human  operator. 

The  limitations  (high  probability  of  detection/recognition  at  an  unacceptable  level  of  false  alarms)  of  current  systems 
utilizing  a  single  sensing  domain  in  addressing  the  various  deployed  CC&D  techniques  have  led  to  the  incorporation 
of  multiple  sensors.  It  is  expected  that  the  result  of  fusing  data  from  multiple  independent  sensors  will  offer  the 
potential  for  better  performance  than  can  be  achieved  by  either  sensor,  and  will  reduce  vulnerability  to  sensor- 
specific  countermeasures  and  deployment  factors.  The  first  and  most  significant  increment  in  performance 
improvement  will  come  from  multi-source  fusion  at  the  prescreening  (target  detection)  stage.  This  will  enable  either 
the  use  of  current  recognition  processing  or  will  allow  the  application  of  more  complex  processes  to  a  reduced  set  of 
detections. 

Multi/hyper-spectral  (MS/HS)  sensors  are  examples  of  the  trend  to  add  dimensionality  (in  this  case  -  spectral)  in 
order  to  achieve  significant  performance  gains.  This  trend  is  reflected  in  plans  to  host  multiple  pods  such  as  SAR, 
EO/IR,  and  multi-spectral  sensors  on  UAVs.  Multi/hyper- spectral  and  SAR  sensor  data  offers  the  most  potential  for 
defeating  CC&D.  SAR  sensors  have  the  advantage  of  excellent  standoff,  all  weather  capabilities,  and  broad  area 
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coverage.  Current  operating  frequencies  for  SAR  sensors  are  limited  in  their  detection  capability  of  targets 
embedded  in  deep  foliage  clutter.  As  a  result  a  significant  effort  has  been  devoted  to  the  development  and 
demonstration  of  Foliage  Penetration  SAR  sensors  with  the  corresponding  processing  techniques. 

Multi  and  hyper  spectral  sensors  provide  complementary  information  by  measuring  solar  reflection  and  the  thermal 
emission  across  targets  and  backgrounds.  The  result  is  multiple  2D  projection  of  the  object  and  its  internal  attributes 
(as  a  function  of  wavebands)  onto  a  plane  perpendicular  to  the  line  of  sight.  These  sensors  can  provide  high 
definition  imagery  of  the  object  silhouette,  provided  that  there  is  discernible  contrast  (in  an  individual  band  or 
accumulated  over  bands)  between  target  and  background.  High  resolution  spectral  sensors  can  also  provide 
definition  of  individual  target  features  such  as  gun,  hatch,  wheels,  engine  compartments,  exhaust  ports,  etc.,  based  on 
color,  intensity  or  thermal  differences  between  target  components.  We  have  seen  sensors  operating  at  the  reflective 
region  of  the  EO  spectrum  from  visible  wavelengths  to  short-wave  IR  (SWIR),  as  well  as  thermal  sensors  operating 
at  either  midwave  (3-5p)  or  longwave  (8-12p).  Multi-spectral  (MS)  or  hyper-spectral  (HS)  sensors  cover  the  EO  to 
far  IR  spectral  region  typically  with  10  to  15  broad  bands  for  MS  sensors  and  50  to  hundreds  of  fine,  narrow  bands 
for  HS  sensors.  In  the  reflective  (visible  to  near  IR)  and  the  thermal  (8  to  12p)  spectral  region  the  added 
information  can  improve  performance  primarily  against  camouflage,  concealment,  and  countermeasures. 

Each  of  the  two  sensors  can  provide  robust  ATC  performance  and  defeat  CC&D,  but  the  false  alarm  rate  is  higher 
than  desired.  The  fusion  of  information  extracted  from  these  two  sensors  will  provide  the  desired  ATC/ATR 
performance,  while  minimizing  the  false  alarm  rate  to  a  manageable  level.  This  is  due  to  the  complementary 
information  across  the  two  sensing  domain  and  the  differences  in  the  source  of  false  alarms. 


2.0  FUSION  METHODS 

Four  levels  of  multiple  sensor  data  fusion  are  described.  The  highest  level  of  fusion  occurs  when  the  multiple  images 
are  combined  to  a  single  image.  Each  location  in  the  combined  image  has  an  associated  vector  of  measurements 
from  each  of  the  sensors.  The  new  image  is  then  processed  by  an  algorithm  (such  as  an  ATC/R)  that  simultaneously 
operates  on  the  vector  of  values.  Fusion  techniques  that  operate  in  that  mode  are  known  as  centralized  data  fusion 
methods.  They  typically  assume  common  image  projection  plane  for  the  multiple  sensors  and  often  rely  on  high 
level  of  image  correlation.  The  centralized  technique  contains  a  fusion  center  in  which  the  measurements  or  feature 
vectors  from  each  of  the  sensors  are  processed  to  form  a  global  decision.  One  such  sensor  configuration  that  is  well 
matched  for  this  type  of  approach  is  a  multi- spectral  or  hyper- spectral  sensor.  In  this  case,  the  data  is  highly 
correlated  for  natural  clutter  and  well  aligned,  thus  providing  the  capability  for  fusing  the  image  data  at  the  pixel- 
level. 


The  next  level  of  sensor  fusion  is  distributed  data  fusion.  In  distributed  fusion,  each  sensor  makes  an  independent 
decision  based  on  its  own  observations  and  passes  these  decisions  to  the  fusion  node  where  a  global  decision  is 
made.  Since  the  distributed  fusion  technique  transmits  less  information  to  the  fusion  node,  its  performance  may  be 
degraded  relative  to  the  centralized  approach.  This  approach  provides  a  more  practical  solution  to  near  real-time 
systems,  and  it  offers  maximum  benefit  when  there  is  simultaneous  sensing  by  the  various  sensors. 

An  alternate  approach  that  utilizes  the  advantages  from  both  the  centralized  and  distributed  techniques  is  termed 
hybrid  fusion.  Figure  1  illustrates  the  hybrid  fusion  architecture.  First,  each  sensor  makes  an  independent  report 
based  on  its  own  observations  or  features.  The  process  of  automatic  target  cueing  may  consist  of  only  the  first  stage 
of  detection  or  may  incorporate  the  second  stage  of  target  discrimination  (false  alarm  reduction)  for  each  of  the 
sensors  before  the  individual  decisions  are  made.  Thus,  a  list  of  candidate  targets  is  independently  generated  from 
each  sensor.  This  preliminary  detection  hypothesis  is  a  soft  decision.  The  combined  hypothesis  space  is  focused 
only  on  candidate  targets  that  appear  in  both  image  domains.  This  results  in  a  significant  reduction  in  the  number  of 
hypotheses  for  subsequent  processes.  It  also  simplifies  the  geolocation/registration  process,  since  there  is  only  a 
need  to  associate  cues. 


This  approach  is  applicable  only  if  the  probability  of  detection  is  high  in  each  sensor  domain,  otherwise  the 
probability  of  detection  will  be  driven  by  the  lowest  performing  sensor.  If  the  false  alarms  are  not  correlated  in 
terms  of  location,  a  substantial  reduction  will  be  achieved  based  only  on  geolocation.  However,  even  if  the  false 
alarms  correlate  in  terms  of  position,  a  substantial  reduction  is  still  achievable  if  the  extracted  features  decorrelate. 
In  this  case  the  joint  distribution  of  the  extracted  features  will  provide  better  separability  (target/clutter)  than  each 
feature  set  alone. 
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Figure  1 :  Hybrid  Data  Fusion  Architecture 

The  next  level  of  fusion,  termed  cued  fusion  (depicted  in  Figure  2),  designates  one  sensor  as  the  primary  sensor  and 
utilizes  the  second  sensor  for  false  alarm  reduction.  In  this  mode  of  fusion,  the  data  of  the  primary  sensor  is  used  to 
derive  an  initial  set  of  target  cues,  a  set  of  associated  features/attributes  and  the  level  of  confidence  based  on  the 
primary  sensor  data.  Since  ATC  is  performed  only  in  the  primary  sensor  domain,  the  corresponding  image  location 
for  each  target  cue  in  the  other  sensor  domain  is  estimated  by  automated  geolocation/registration  processes.  A  set  of 
target  features  is  derived  from  the  estimated  target  location  and  combined  with  the  attributes  extracted  from  the 
primary  sensor.  The  combined  set  of  features  is  passed  to  a  classifier  that  is  trained  on  the  joint  distribution.  A  final 
decision  is  made  by  combining  the  estimated  target  confidence  derived  from  the  primary  sensor  and  the  confidence 
based  on  the  joint  set  of  features.  The  cued  fusion  requires  either  better  geolocation  accuracy  than  the  hybrid 
approach  or  an  alternate  mechanism  for  refining  geolocation. 


Features 


Figure  2:  Cued  Fusion  Architecture 


The  lowest/minimal  level  of  sensor  fusion,  termed  hand-off  fusion,  is  a  mode  in  which  one  sensor  cues  a  second 
sensor  to  an  area  of  interest.  At  this  point  a  complete  hand-off  to  the  secondary  sensor  occurs,  and  all  of  the  imagery 
collected  by  this  sensor  is  independently  processed.  The  final  set  of  detection  cues  is  generated  from  data  of  the 
cued  sensor. 


3.0  FUSION  METHODS  FOR  SPECTRAL  AND  SAR  SENSORS 

Two  of  the  five  methods  that  were  described  (hybrid  fusion  and  cued  fusion)  present  a  practical  fusion  process  that 
provides  the  performance  improvements  for  SAR  and  hyper/multi  spectral  (HS/MS)  sensors.  For  SAR  and  HS 
sensors,  simultaneous  sensing  over  the  same  region  on  the  ground  is  likely  to  occur  when  the  two  sensors  are 
mounted  on  separate  platforms.  If  the  two  sensors  are  located  on  a  single  platform,  the  two  sensors  may  image  the 
same  region  with  a  time  delay.  SAR  sensors  are  operating  at  standoff  ranges  (side  looking),  while  HS  sensors  can 
image  from  nadir  to  lower  depression  angles.  Since  the  sensing  geometries  are  likely  to  be  different,  the  fusion  node 
will  have  to  correlate  the  decisions  of  each  sensor.  In  this  case,  if  the  targets  are  over  resolved  (even  at  coarse  to 
medium  resolutions),  the  correlation  does  not  have  to  achieve  the  accuracy  required  by  the  centralized  fusion 
process. 

The  backbone  of  the  hybrid  and  cued  fusion  processes  is  a  target/clutter  classifier  that  either  operates  against  features 
extracted  from  a  single  sensor  or  against  the  combined  multi-sensor  feature  set.  Figure  3  illustrates  the  overall 
target/clutter  discrimination  process  that  is  applied  to  the  set  of  features.  It  consists  of  two  main  processes:  feature 
extraction  and  detection  classification.  The  selected  set  of  features  is  aimed  at  achieving  discrimination  between 
targets  and  natural  clutter.  A  training  set  is  used  to  estimate  target/clutter  statistics  required  for  actual  operation,  and 
the  ground  truth  information  is  used  to  determine  whether  the  object  is  a  target  or  a  clutter  for  the  training  portion  of 
the  algorithm.  For  performance  evaluation  and  actual  operation  the  extracted  features  generated  from  either  test 
imagery  (not  included  in  the  training  set)  or  actual  imagery  are  processed  by  the  classifier  to  determine  the  class 
(target,  clutter)  of  the  detected  objects  and  the  confidence  level  for  that  decision. 
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Figure  3:  Feature-level  Fusion  Block  Diagram 


3.1  Cross  Sensor  Correlation  and  Geolocation  Accuracy 

In  order  to  perform  feature  level  fusion  (either  hybrid  or  cued)  there  is  a  need  to  determine  the  corresponding 
location  of  each  point/object  on  the  ground  in  each  of  the  input  images.  Because  of  inaccuracies  in  the  geolocation 
process,  an  error  basket  proportional  to  the  expected  error  has  to  be  placed.  More  specifically,  if  a  detection  cue  is 
located  in  one  image  at  location  (xl,  yl),  it  is  transformed  to  the  estimated  location  (x2,  y2)  in  the  second  image. 
These  coordinates  may  not  be  physically  located  in  the  same  region  on  the  ground  due  to  geolocation  error. 
Therefore,  an  error  basket  is  used  to  improve  the  likelihood  that  there  is  an  object  located  within  this  uncertainty 
region  in  the  second  image. 

3.2  Hybrid  Fusion 

There  are  three  cases  to  be  considered  in  hybrid  fusion, 

Case  1:  No  detection  is  generated  in  the  combined  hypothesis  space  when  a  detection  is  reported  in  one  image  but  no 
detection  is  reported  within  the  uncertainty  region  in  the  other  image. 

Case  2:  If  a  single  detection  is  reported  within  the  uncertainty  region  of  the  other  image,  the  features  from  both 
sensors  are  combined  to  the  determine  the  object  classification. 

Case  3:  If  multiple  detections  are  reported  within  the  uncertainty  region,  the  ambiguities  are  resolved  through  data 
association).  We  determine  the  one  detection  report  that  best  matches  the  report  from  the  other  sensor.  More 
specifically,  we  use  the  object  discrimination/classification  process  to  determine  the  correct  data  association.  For 
example,  if  for  a  single  detection  in  the  spectral  image  there  are  three  detections  reported  in  the  SAR  uncertainty 
region.  A  feature  vector  is  generated  for  all  of  the  three  SAR  objects.  Another  set  of  features  is  generated  for  the 
single  object  in  the  spectral  image  domain.  These  features  are  combined  with  each  of  the  three  SAR  feature  vectors 
to  form  three  distinct  combined  feature  vectors.  Each  of  these  feature  vectors  is  examined  by  the  pre-trained 
target/clutter  classifier  to  determine  if  any  of  them  exhibits  target-like  characteristics.  If  all  of  them  are  classified  as 
clutter,  the  detection  cue  generated  at  this  location  is  rejected  as  a  false  alarm.  Otherwise,  a  detection  is  retained  with 
the  corresponding  combined  set  of  features  that  exhibited  the  strongest  target-like  behavior. 

3.3  Cued  Fusion 

In  cued-level  fusion,  we  can  not  use  the  object  discrimination  approach  to  correct  for  geolocation  errors,  since  it 
performs  ATC  only  on  the  primary  sensor.  Normally,  this  approach  requires  very  accurate  geolocation  in  order  for 
the  feature-level  fusion  process  to  perform  well.  If  the  transformed  coordinates  of  the  object  in  the  primary  image 
have  a  large  geolocation  error,  the  extracted  features  from  the  derived  coordinates  may  not  correspond  to  the 
detected  object  in  the  primary  image.  The  larger  the  registration  error,  the  worse  the  feature  representation. 
However,  it  is  possible  to  refine  the  initial  estimate  (within  the  error  basket)  using  contrast.  This  approach  has  been 
evaluated  under  the  MSET  program  and  demonstrated  that  no  significant  degradation  occurred  with  reasonable 
registration  errors. 

3.4  Feature  Extraction 

The  key  to  the  success  of  the  two  data  fusion  approaches  is  the  feature-level  fusion  process  as  described  in  the 
previous  section.  The  most  important  part  of  the  feature-level  fusion  process  is  the  selection  of  features  for  both 
spectral  and  SAR  imagery  that  are  used  for  target/clutter  discrimination  and  to  mitigate  the  effects  of  registration 
error. 

The  features  can  be  grouped  into  three  main  categories:  statistics-based,  fractal-based  and  correlation-based.  The 
statistics-based  features  generally  use  amplitude-based  statistics  to  characterize  the  detected  area.  Fractal-based 
features  estimate  the  fractal/no-fractal  behavior.  The  correlation-based  features  measure  the  level  of  spatial 


correlation  of  targets  and  clutter.  These  features  are  combined  into  a  single  feature  vector  that  is  passed  to  the  pre¬ 
trained  classifier  to  eliminate/reduce  false  alarms. 


3.5  Feature  Classification 


A  classifier  is  used  to  determine  the  separability  between  targets  and  clutter.  One  common  method  is  to  apply  a 
minimum  distance  classifier  that  is  equivalent  to  the  Bayesian  linear  classifier  when  the  feature  vector  elements  are 
uncorrelated,  Gaussian,  and  have  unit  variance.  Assuming  that  both  targets  and  clutter  are  equally  likely,  the 
decision  is  target  if 

Pi(F)  <  P2(F)  (1) 

where 

Pi(F)  =  (F  -m/)(F  -m/)T 

mi  =  mean  feature  vector  of  the  ith  class  (mi  -  mean  of  target  class,  m2  -  mean  of  clutter  class) 

F  =  feature  vector  to  be  classified 

The  assumption  in  Eq.  1  ignores  any  correlation  across  features,  however  the  band-to-band  correlation  exists  in 
hyperspectral  image  data  as  it  is  exploited  to  accomplish  improved  clutter  suppression.  To  account  for  the  band-to- 
band  correlation  the  general  form  of  the  Bayesian  linear  classifier,  assuming  normal  distributions,  is  given  by 

Pi(F)  =  -0.5(F  -mI  )S71(F  -m,)T  -0.51n|£,|  (2) 


where  mj 


IS| 

F 


=  mean  feature  vector  of  the  ith  class 

(mi  -  mean  of  target  class,  m2  -  mean  of  clutter  class) 

=  covariance  matrix  of  each  class 
=  determinant  of  the  covariance  matrix 

=  feature  vector  to  be  classified 


For  the  combined  spectral/SAR  feature  vector,  Zj  becomes, 


Li  = 


HS 

0 


0 

^ SAR 


(3) 


3.6  PNN  Classifier 

Equation  (2)  assumes  that  the  features  have  a  Gaussian  distribution.  This  assumption  is  not  generally  valid  for  real 
clutter.  In  addition,  for  a  large  number  of  features,  such  as  might  be  the  case  for  the  combination  of  spectral  and 
SAR  sensor  data,  the  inverse  of  the  covariance  matrix  may  not  have  a  stable  solution  (matrix  of  lower  rank  than  the 
number  of  features).  These  two  characteristics  of  classical  classifiers  have  motivated  the  selection  of  a  different  type 
of  classifier  to  perform  multi-sensor  target/clutter  discrimination.  More  specifically,  a  modified  version  of  a 
Probabilistic  Neural  Network  (PNN)  classifier  was  selected.  It  can  be  shown  that  it  maps  into  a  feed-forward  neural 
network  structure  typified  by  many  simple  parallel  processors.  Unlike  a  variety  of  neural  network  classifiers  the 
training  time  for  the  PNN  is  negligible.  The  PNN  classifier  can  be  expressed  as  a  sum  of  Gaussian  probability 
density  functions  as  shown  below: 
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=  (27t)-n/2|Ei|-1/2 
=  class  number 
=  training  pattern  number 
=  total  number  of  training  patterns 
=  jth  training  pattern  from  class  i 
=  number  of  features  in  the  pattern  (dimension  space) 
=  covariance  matrix  of  class  i 
=  covariance  gain  factor 
=  feature  vector  to  be  classified 


Pj(F)  is  simply  the  sum  of  small  multivariate  Gaussian  distribution  centered  on  each  training  sample.  However,  the 

sum  is  not  limited  to  being  Gaussian.  In  fact  it  can  approximate  any  smooth  density  function.  The  decision 
boundary  of  the  probability  density  function  has  been  shown  to  asymptotically  approach  the  Bayesian  optimal 
decision  surface  as  the  number  of  training  patterns  increases.  For  a  large  feature  set,  the  inverse  covariance  may  also 
become  unstable,  thereby  degrading  performance.  The  standard  Quadratic  Bayesian  classifier  exhibits  this  effect 
too.  In  order  to  avoid  the  inversion  of  the  covariance  matrix,  only  the  variance  of  the  individual  feature  distribution 
(with  a  user  defined  scaling  factor)  is  used.  Figure  4  provides  an  example  of  hybrid  feature  level  fusion.  It  shows  the 
detection  cues  generated  by  AAEC’s  geometric  whitening  filter  (GWF)  and  overlaid  on  one  of  the  multi-spectral 
bands  and  SAR-derived  detection  cues  overlaid  on  the  SAR  image.  The  combined  fusion  process  was  able  to  detect 
the  two  camouflaged  targets  with  a  single  false  alarm. 
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Figure  4  :  Hybrid  feature  level  fusion  for  false  alarm  reduction  using 
SAR  and  MS  (5  bands)  Imagery 


raw  data  :  top  -  single  spectral  band  (1  of  5)  output  of  single  sensor  prescreening 
bottom  -  SAR  imagery  (cues  appear  as  white  overlays) 
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Hybrid  Feature  Fusion 

remaining  cue  after  feature 
level  fusion  (due  to  clutter) 
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target  cues  after  fea  ure  fusion 
Overlapping  ground  coverage  is  marked 


by  the  white  lines 


Multiple  sensor  inputs  are  at 
approximately  the  same  scale,  but 
the  sensing  geometries  are  different 


